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Abstract 


Although ligand-binding sites in many proteins contain a high number density of 
charged side chains that can polarize small organic molecules and influence binding, the 
magnitude of this effect has not been studied in many systems. Here, we use a quantum 
mechanics/molecular mechanics (QM/MM) approach in which the ligand is the QM 
region to compute the ligand polarization energy of 286 protein-ligand complexes from 
the PDBBind Core Set (release 2016). We observe that the ligand polarization energy 
is linearly correlated with the magnitude of the electric held acting on the ligand, the 
magnitude of the induced dipole moment, and the classical polarization energy. The 
influence of protein and cation charges on the ligand polarization diminishes with the 
distance and is below 2 kcal/mol at 9 A and 1 kcal/mol at 12 A. Considering both 
polarization and solvation appears essential to computing negative binding energies 
in some crystallographic complexes. Solvation, but not polarization, is essential for 
achieving moderate correlation with experimental binding free energies. 


1 Introduction 

Noncovalent binding to proteins is a key mechanism by which small organic molecules (lig¬ 
ands) interact with biological systems. Most drugs are noncovalent inhibitors of particular 
targets. Signaling molecules generally bind to specific receptors. Molecules with low solu¬ 
bility often bind to serum albumin. Even in enzymes, noncovalent binding of substrates is a 
prerequisite to catalysis. 

Many proteins generate a strong electrostatic potential that can influence ligand bind¬ 
ing. To promote stable folding, globular proteins typically consist of a hydrophobic core 
and hydrophilic surface. Many amino acids in the latter region are charged. Indeed, in an 
analysis of 573 enzyme structures, Jimenez-Morales et al. 1S observed a high number density 
of oft-charged acidic (aspartic and glutamic acid) and basic (lysine, arginine, and histidine) 
amino acids in catalytic sites (18.9 ± 0.58 mol L _1 ) and other surface pockets, including 
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ligand-binding sites (28.2 ± 0.34 mol L^ 1 ). For context, the number density of charges is 
2.82 ± 0.03 mol L _1 in entire proteins® and 74.3 mol L _1 in a sodium chloride salt crystal. 2 
Charged amino acid side chains generate patterns in the surrounding electrostatic potential 
that can have functional roles that include mediating associations with other proteins with 
complementary electrostatics and channeling charged enzyme substrates/ 3 Within a pro¬ 
tein, electrostatic forces can alter redox potentials, shift the pK a s of amino acid residues,® 
accelerate enzyme catalysis,^ 5 and polarize ligands.® 

The importance of ligand polarization in protein-ligand binding has been demonstrated 
by studies that compare results from similar models with and without polarization. Although 
the vast majority of current studies modeling biological macromolecules are based on fixed- 
charge molecular mechanics force fields, polarizable models are being actively developed. 78 
Jiao et al. 9 demonstrated that incorporating polarization into a molecular mechanics force 
held was essential to accurately computing the binding free energy between trypsin and 
the charged ligands benzamidine and diazamidine. Quantum mechanics (QM) and mixed 
quantum mechanics/molecular mechanics (QM/MM) methods have also been increasingly 
employed in predicting the binding pose — the configuration and orientation of a ligand in 
a complex — and binding affinity. 10 11 Semiempirical QM methods have shown particular 
promise in correctly distinguishing the native (near-crystallographic) binding pose from de¬ 
coy poses (non-native poses that have low docking scores) in diverse sets of protein-ligand 
complexes.^ 2 15 QM/MM methods usually couple the QM and MM regions via electrostatic 
embedding, in which charges from the MM region alter the Hamiltonian in the QM re¬ 
gion. Electrostatic embedding allows the QM region (which in most protein-ligand binding 
studies includes the ligand and sometimes surrounding residues) to polarize in response to 
charges in the environment. Clio et al. 16 demonstrated the importance of embedding by 
evaluating the ability of multiple docking schemes to recapitulate ligand binding poses in 
40 diverse complexes. They found that assigning ligand charges using a QM/MM method 
with electrostatic embedding was generally more successful than a gas-phase QM method 
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without embedding. Subsequently, Kim and Cho 1 ' performed a more systematic assessment 
focusing on 40 G protein-coupled receptor crystal structures. The QM/MM method out¬ 
performed (1.115 A average RMSD and RMSD<2 A in 36/40 complexes) a gas-phase QM 
method without embedding (1.672 A average RMSD and RMSD<2 A in 31/40 complexes) 
and a fixed-charge molecular mechanics method (1.735 A average RMSD and RMSD<2 A in 
32/40 complexes). Beyond the context of protein-ligand binding, the inclusion of the polar¬ 
ization energy has been shown to dramatically affect water density 18 and the structure and 
dynamics of solvated ions in water clusters.®®^ 

Ligand polarization effects have also been isolated using a decomposition scheme pio¬ 
neered by Gao and Xia®^, which was originally applied to the polarization of solutes by 
aqueous solvents. In this scheme, the polarization energy of molecule /, (Eq. [6]) , is the 
sum of the energy from distorting the wave function, 5^ lst (Eq. [8j, and the energy from 
stabilizing Coulomb interactions relative to the gas phase, Sj tab (Eq. [ 9 ]). For three high- 
affinity inhibitors of human immunodeficiency virus type 1 (HIV-1) protease, Hensen et al. 6 
found that the magnitude of the ligand polarization energy can be as large as one-third of 
the electrostatic interaction energy. Fong et al.®^ considered 6 ligands of HIV-1 protease in 
near-native poses and found that depending on the level of theory, the polarization energy 
is from 16% to 21% of the electrostatic interaction energy. 

Although comparative studies and energy decomposition schemes have strongly indicated 
the importance of ligand polarization, the magnitude of this term and the factors contribut¬ 
ing to the ligand polarization energy have not, to our knowledge, been investigated for many 
diverse systems. Here, we address this knowledge gap by calculating the ligand polariza¬ 
tion energy for 286 protein-ligand complexes from the PDBBind Core Set (release 2016) J®^ 
The PDBBind is a comprehensive database of complexes for which both Protein Data Bank 
crystal structures and binding affinity data are available. The Core set is a subset of the 
PDBBind with high-quality and non-redundant structures meant as a benchmark for molec¬ 
ular docking methods. The size and diversity of this dataset allow us to draw more general 
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and statistically meaningful conclusions about ligand polarization than previous efforts. 


2 Theory and Methods 


2.1 Energies 

We employed a QM/MM scheme in which the ligand is the QM region and other atoms are 
the MM region. To enable energy decomposition, the Schrodinger equation for the ligand 
was solved both in the gas phase and with electrostatic embedding. 

In the gas phase, the Hamiltonian operator Hj of a molecule / is, 
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where i and j are indices over all electrons and A and B are indices over all atoms in molecule 
I. pi is the momentum operator and m e is the mass of an electron. r t is the position of 
electron i, Ra is the position of atom A, and Za is the atomic number of atom A. rij is the 
distance between electrons i and j, and Rab is the distance between atoms A and B. The 
ground-state energy Ej of the molecule / is 


Ej = (tt/llf/l*/), 


( 2 ) 


where T/ is the electronic wave function of the molecule I. 

When the molecule / is placed in an embedding held Q j ~ {qf}, the Hamiltonian operator 
of the embedded molecule is given by Hj : q i — Hj E H[i/Q r ]. We will use I : Qi to denote 
the embedding of molecule I in the embedding held Qj. The Hamiltonian operator H^/q^ 
for Coulomb interactions between the molecule / (QM) and the held Qj (MM) is, 
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where F is an index over charges in the embedding held. The first summand describes 
electron-charge interactions and the second proton-charge interactions. The ground-state 
energy Ej.Qj of the embedded molecule I : Qi is obtained by 

Ei,Qi = {'S’ hQ^HhQ^ i, Ql ), (4) 

where ^i.Qj is the ground-state electronic wave function of the embedded molecule / : Qj. 
The embedding held should affect the ground-state wave function of the molecule such that 

I^IV l^/l 2 - 

We will use the symbol 5 to denote a difference between two expectation values. The 
electronic interaction energy describes the change in electronic energy of a molecule upon 
interaction with the embedding held, 

5/ lec = {*I:Q i \H I:Q i \* I:Qi ) - (5) 

Hensen et al. 6 decomposed Zf ec into the polarization energy of a molecule, 

Zf = (V I:Ql \H I:Ql \y I:Ql ) - ( 6 ) 

the difference in the expectation of Hi-.Qj between the gas phase and in the embedding held, 
and the Coulomb interaction energy between a molecule and the embedding held, 

Ewt=(*i\H [ i/ Ql ]\*i). (?) 

such that Sf lec = + Ef^. Hensen et al. 6 further decomposed the polarization energy 

into an energy of distorting the gas-phase wave function, 

Sf st = - (T / | J H / |T / ), (8) 
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and the energy of stabilizing interactions with the embedding held Qj = {qp}, 
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In a system of N molecules, the total electronic interaction energy and its decomposition 
into polarization and permanent Coulomb energies are, 
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In the Coulomb and polarization energies, if Coul , the factor of 1/2 is introduced to com¬ 
pensate for doubly counting the interaction energy. Like S po1 , S dlst and S stab are similarly 
defined as sums over all molecules. In our present scheme, only one molecule, the ligand, is 
treated quantum mechanically. Thus S elec = Sf lec , S po1 = S^ 01 , S dlst = S dlst , S stab = Hf ab , 
and E Coul = Ey// 1 , where / is the ligand molecule. 

Both T j and Tfqj were calculated using the restricted Hartree-Fock method 26 in con¬ 
junction with the 6-311G** basis set! 2 ' The atomic charge of atoms A with and without the 
embedding held Qj = and qf ^, respectively, were obtained by fitting to the 

quantum mechanical electrostatic potential (ESP) using the restrained electrostatic potential 
(RESP) method.® 

Fitted point charges were used to evaluate the stabilization energy, 
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For most reported calculations, the embedding held Qj = {^f} consisted of all of the 
non-ligand atoms in the system. In order to evaluate the distance at which embedding held 
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atoms affect the polarization energy, we also performed calculations in which the embedding 
held consists of all atoms within a cutoff parameter R cut of any ligand atom. The cutoff 
parameter was varied from R cnt £ {4, 5,10,12,20}. Even when different R cut were used 
for determining T/.q 7 and q® M ' Ql , energies were evaluated using an embedding held based 
on all atoms in the model. 

In addition to the electrostatic interaction energy, coupling between the QM and MM 
region also includes a van der Waals interaction energy modeled by the Lennard-Jones po¬ 
tential, 
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where a af and 6af are the Lennard-Jones parameters. Combined with E Cou \ E vdW makes 
up the intermolecular pairwise interaction energy, 
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For a system in solution, opposed to the gas phase, we also consider the solvation free 
energy. We will use W (A") to denote a solvation free energy, where X £ {PL, P , L} represent 
the complex, protein, and ligand, respectively. The solvation free energy is an integral over all 
the solvent degrees of freedom. In principle, there are many ways to compute this quantity. 
In this paper, we used the Onufriev Bashford Case 2 (OBC2) 29 generalized Born/surface 
area implicit solvent model. 

The total binding energy is given by (Figure [I]), 

vf bind = E pair + F po1 + IF billd , (16) 

W bind = W(PL)-W{P)-W(L). (17) 

In the W{PL) calculation, q^"® 1 are used for ligand partial charges. On the other hand, 






the W ( L ) calculation uses q® M for ligand partial charges. 



Figure 1: Schematic illustrating the decomposition of binding energy, 'b hmd , 
into desolvation free energy of the protein, — W(P), the desolvation free energy of the 
ligand, —W(L), the intermolecular pairwise interaction energy, E pan , the ligand 
polarization energy, S po1 , and the solvation free energy of the complex, W(PL). 


To isolate the effects of polarization, we also define a total binding energy that does not 
consider ligand polarization, 


^bind,np 
p^/-bind,np 

ppbmd.np differs f rom W bind because q are used for ligand partial charges in the W(PL ) 
calculation. \h bind ’ np is the binding energy for a purely MM model. 

2.2 Other Properties 

We computed a number of other properties to assess whether they have a clear relationship 
with the polarization energy. 

Motivated by the observation of a high density of acid and base side chains in enzymes,^ 
we computed two quantities: the percentage of atoms in a protein that are highly charged; 


^ipair _|_ j^bind,np 

(18) 

W(PL, np) - W(P) - W(L). 

(19) 
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and the number density of highly charged atoms within 6 A of any ligand atom. The 
percentage of atoms in the protein that are highly charged is defined as, 

1 N 

l«il - 0.6) x 100, (20) 

i 

where i is an index over atoms in the protein and N is the total number of atoms in the 
protein. This expression uses the Heaviside step function, 

{ 0 , x < 0, 

( 21 ) 

1 , x > 0, 

where a; is a real number. The volume of the binding site was determined by Monte Carlo 
integration. To perform this integration, a box was defined that includes 6 A around the 
range of the ligand atoms in each dimension. Points within the box were randomly sampled 
from a uniform distribution and assessed for the distance to the nearest ligand atom. The 
site volume was estimated by the product of the box volume and the fraction of points in 
the box within 6 A of a ligand atom. 

We also computed a number of properties inspired by classical electrostatics. In classical 
electrostatics, the internal energy of a dipole moment in an electric held is the dot product 
of the dipole with the held. We considered two classical models: one in which the entire 
ligand is treated as a dipole and a second in which each atom is treated as a dipole. 

If the ligand is considered as a dipole, the change in internal energy due to an induced 
dipole is, 


"Pol’CL = —/x“ ld - E° l , (22) 

where /i“ d is the induced dipole moment of the ligand L and is the electric held acting 
on the ligand L due to the embedding held Ql = {^f} consisting of atomic charges of the 
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surrounding atoms. The electric field acting on the center of mass (or protons) Rc of the 
ligand is, 


K = E tIT R cf. (23) 

raj, n CF 

where F runs over the atomic sites in the embedding field. Rcf = Rc~Rf and Rcf = \Rcf\- 
The induced dipole moment of the ligand was calculated in two ways. The first was 
from the expectation value of the dipole moments, 
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where jj, is the dipole moment operator. The second was based on the molecular polarizability 
tensor, oil, and the electric field on the center of mass of the ligand, 

/T L nd ’“ L = (25) 

Elements of the molecular polarizability tensor {pti)xy describe the susceptibility of a molecule 
to polarization along the x axis due to an electric field along the y axis. As in Willow et al. 
these tensor elements were calculated based on placing a pair of point charges of =)=1 a.u. 
at R cm ± 100 Bohr along a Cartesian axis, where R cm represents the center of mass of the 
ligand, to create an electric field. Then {oL L ) xy were evaluated as the ratio of the induced 
dipole moment due the point charges, /i“ d ’ pc , and the electric field applied by the point 
charge onto the ligand, E^' pc , 

= (^ d ’ PC ) 3 
( e ° l pc % 

The dipole moment from the electron density is more accurate and valuable for assessing 
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the correspondence between S po1 and S pol,c . However, it is not a practical shortcut to the 
polarization energy because it requires the same quantum chemistry calculations used to 
compute S po1 . On the other hand, although the molecular polarizability tensor, oil, requires 
three quantum chemistry calculations, it can be reused (as an approximation) for multiple 
ligand configurations. Hence, the dipole moment from the molecular polarizability tensor, 
, could potentially reduce the computational costs of S po1 prediction. To facilitate 
comparison with the polarization energy, we also computed the molecular polarizability 
scalar of the ligand, or,, defined as, 


at = .,Tr Q/J. 


(27) 


where Tr is the trace of a square matrix. 

If each atom on the ligand is considered as a dipole, then the change in internal energy 
due to an induced dipole is, 


;pol,cA _ 
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The electric held acting on an atom is, 
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where A runs over all atomic sites in the ligand. The induced dipole on each atom was 
computed based on RESP charges as, 
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In all, we considered the relationship between S po1 and a number of other properties: the 


1. percentage of highly charged atoms in a protein (Eq. 20); 
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2. molecular polarizability scalar, (Eq. 27); 


3. Coulomb interaction energy, E Coul (Eq. |t|) ; 

4. magnitude of the electric field on the ligand center of mass, |E 3 |, where E° is from Eq. 
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5. magnitude of total electric held on the ligand atom sites, I 'Yh AeL E^l, where E° 4 is from 


Eq. 29 


6 . magnitude of the induced dipole moment based on wave functions, |/x™ d,< ^ M |, where 
(i “id’QM i s from Eq. 
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7. magnitude of the induced dipole moment based on the molecular polarizability tensor, 
|^i“ ld ’" L |, where fi l ^ d,otL is from Eq. 
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8 . classical polarization energy of a ligand dipole, S pol,cL (Eq. 22), using Eq. 24 for the 
induced dipole moment. 


9. classical polarization energy of a ligand dipole, E pol ’ cL ’ QL (Eq. 22), using Eq. 25 for the 
induced dipole moment. 


10. and classical polarization energy of atomic dipoles, S pol,cA (Eq. 28). 


2.3 Computational Methods 

Structures from the PDBBind Core Set (release 2016) were processed through an automated 
workflow based on Amber Tools 17® 2 and customized QM/MM codes. Protein protonation 
states were assigned using PDB2PQR 1.9.0 at a pH of 7.0 and ligand protonation states 
using pkatyper in the QUACPAC 1.7.0.2 toolkit (OpenEye). AMBER topology hies based 
on protein and cation parameters (Na + , Mg 2+ , Ca 2+ , and Zn 2+ ) from the AMBER ffl4SB 
force held 33 and ligand parameters from the Generalized AMBER Force Field 2 30 were built 
using Amber Tools 17. 32 
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Using OpenMM 7 . 3 . 1,®1 complexes in OBC2 29 generalized Born/surface area implicit 
solvent were minimized with heavy atom restraints of 2 kcal/mol/A 2 towards crystallographic 
positions until energies converged within 0.24 kcal/mol. 

In our modified QM/MM codes, the evaluation of molecular integrals of many-body op¬ 
erators over Gaussian functions were obtained using libint 2.5.0 35 and the linear algebra and 
eigenvalue decomposition of a symmetric matrix were done with the Armadillo 8.500.1.®® 

OpenMM 7.3.1® was also used to evaluate van der Waals and solvation energies, the 
latter with the OBC2 29 generalized Born/surface area implicit solvent model. 

3 Results and Discussion 

3.1 The distribution of polarization energy is broad and skewed 

Signs of the calculated polarization energy S po1 , the distortion energy S dlst , and the stabiliza¬ 
tion energy S stab are mostly as expected (Fig. [2|. In nearly all of the calculations, S po1 < 0, 
S dlst > 0, and S stab < 0. The embedding field reshapes the wave function to have stronger 
Coulomb interactions between the electronic probability density and point charges, such that 
S stab < 0. Because the gas-phase wave function of the ligand has the optimal intramolecular 
potential, perturbing the wave function leads to a higher intramolecular potential energy 
such that S dlst > 0. In the vast majority of systems, the calculated distortion is more than 
compensated for by the calculated stabilization such that the calculated net effect on the 
interaction energy due to polarization, S po1 , is negative. 

Exceptions to the trend of negative calculated ligand polarization energies are due to 
structural modeling issues that lead to short intermolecular distances. Positive S po1 values 
were calculated in three complexes. In our models of these structures, there are very short 
distances between a hydrogen atom in the ligand and in the protein: 0.73 A in 2fxs, 1.06 A in 
3u5j, and 0.87 A in 4f2w. The close proximity of atoms leads to a severe distortion in the 
wave function that is not overcome by more favorable Coulomb interactions. These steric 
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Figure 2: Histograms of the ligand polarization (top, S po1 ), distortion (middle, S dlst ), and 
stabilization (bottom, S stab ) energies in the PDBBind Core Set. The three quantities are 
related by 5 po1 = 5 dlst + 5 stab . 


15 


















clashes could be resolved by changing the models in minor ways that are equally compatible 
with crystallographic evidence and pKa predictions. In the 2fxs and 4f2w models, the proton 
on a carboxylic acid was arbitrarily placed near a ligand hydrogen instead of on the other 
carbonyl oxygen. In the 3u5j model, the clash could be resolved by switching the position 
of the terminal oxygen and amine groups, which have nearly identical electron density, on 
asparagine 140. 

The distribution of S po1 , S dlst , and S stab is broad and skewed. There is a peak in the 
distribution of S po1 around -10 kcal/mol. However, for a small number of complexes, S po1 is 
much lower, with a minimum value of -128 kcal/mol. 

3.2 Systems with the lowest H po1 have close cations 

We hypothesized that the lowest S po1 could be due to crystallographic cations. To test 
this hypothesis, we subdivided the PDBBind Core Set into two subsets: 90 complexes with 
cations (Na + , Mg 2+ , Ca 2+ , and Zn 2+ ) and 196 complexes without cations in the crystal 
structure. 

Histograms of S po1 for the two subsets are consistent with our hypothesis (Fig. SI in the 
Supporting Information). All systems in which S po1 < -50 kcal/mol are in the subset with 
cations. In contrast, the minimum S po1 in the subset without cations is around -40 kcal/mol. 
The range of S dlst and S stab is also much smaller in the subset without cations. 

Crystallographic cations may have an outsize role in ligand polarization because the mag¬ 
nitude of their charge is larger than the charge of most protein atoms. In the AMBER ffl4SB 
force held,® protein partial charges were determined by applying RESP 28 to electrostatic 
potentials from QM calculations. Most protein atoms have near-zero charge. The magnitude 
of the charge is greater than 0.6e, where e represents the elementary charge, in only a few 
atoms. It is less than le in all atoms. These conclusions are also true for protein atoms in our 
data set (Fig. S2 in the Supporting Information). The low magnitude of charge results from 
delocalization of net charges across several atoms. In contrast, the cations have a charge 
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of +le or +2e that is localized onto a single atom and have a more focused effect on the 
electrostatic potential. 

Beyond the presence of cations, the distance between ligand and cation atoms also plays 
an important role in ligand polarization (Fig. [3]). Even if cations are present in a crystal 
structure, they are not necessarily close enough to the ligand to significantly polarize its 
wave function. In many systems, cations are over 10 A from any ligand atom. In all of the 
complexes in which S po1 < -50 kcal/mol, a cation is within 4 A of a ligand atom. 
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plot of the ligand polarization energy S po1 as a function of the minimum 
a ligand and cation atom, R min , for (a) the entire range of R min and (b) 


Unfortunately, the extent of ligand polarization when ligands are close to cations is likely 
overestimated by our QM/MM scheme. Because only the ligand is included in the QM region, 
cations are simply represented as positive point charges. While actual cations have inner- 
shell electrons that repel further electron density, the point charges are purely attractive. 
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The purely attractive forces draw an unrealistic amount of electron density between the 
ligand and cation, leading to a very negative polarization energy. For an estimate of the 
extent of overpolarization in several systems, see Table SI in the Supporting Information. 
As an illustrative example, there is a significant gain in the electron density between the 
ligand and cation in the complex 3dxl (Fig. [4]). Hence, we will proceed with extra caution 
in interpreting points where S po1 < -50 kcal/mol. 



(a) (b) 


Figure 4: (a) The molecular structure of the ligand with one Zinc cation Zn 2+ in the complex 
3dxl. Hydrogen, carbon, nitrogen, oxygen, and zinc atoms are colored with white, gray, blue, 
red, and green, respectively, (b) The difference in the electronic probability density is plotted. 
Blue and red contours illustrate the gain and loss of the electronic probability density due 
to the embedding field. 

3.3 The importance of the embedding field size diminishes with 
distance 

The size of the embedding field strongly affects estimates of the polarization energy (Fig. 
[5]). Changes in the cutoff distance R cut alter the partial charges included in the embedding 
field, the wave function and then the RESP charges. Regardless of R cut , nearly every 

estimate of AH pol (i? cut ) = 5 pol (R cut ) — 5 pol (i? cut = 00 ) is positive, indicating that the ligand 
wave function accommodates even distant charges in the embedding field. However, the 
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influence of protein and cation charges diminishes with distance. Correspondingly, as AS po1 
diminishes, so does its variance. For larger values of R cut = 8, 9, 10, and 12 A, the mean 
(and standard deviation) of AS po1 is 1.81 (1.77), 1.49 (1.80), 1.10 (1.23), and 0.92 (1.14) 
kcal/mol, respectively. 

The decomposition of the polarization energy into E Coul and E dlst is more sensitive to 
R cut than the polarization energy itself; distributions of the values (relative to values with 
no cutoff) and numerical derivatives are broader. Even at R cut = 8, 9, 10, and 12 A, the 
mean (and standard deviations) of AE Coul are -2.28 (4.94), -1.46 (4.71), -1.18 (4.06), and 
-0.99 (3.38) kcal/mol. 

On average, the decay of AS po1 is well-described by an inverse square law. A nonlinear 
least-squares regression using scipy.optimize.curve_fit (https: //scipy. org/) of XiR~ 2 t + x 2 
for xi and x 2 yielded a curve that closely matches the data. The curve is best for low i? cut , 
slightly underestimates the mean for intermediate R cut , and slightly overestimates the mean 
for larger i?, cut . The inverse square power law is consistent with the dependence of 
ion-induced dipole interactions because the volume of the region containing embedding field 
charges increases as R 2 ut . 

3.4 Of computed properties, 2 po1 is most correlated with the electric 
field, the induced dipole moment, and the classical polarization 
energy 

We observed that a number of properties - the percentage of atoms in a protein that are 
highly charged, the number density of highly charged atoms, and the Coulomb interaction 
energy - have little or only weak correlation with the ligand polarization energy (Figure S3 
in the Supporting Information). 

In contrast with the aforementioned properties, there is a much clearer relationship be¬ 
tween the ligand polarization energy, S po1 , and several other properties: the magnitude of the 
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Figure 5: Dependence of the Coulomb interaction E Cou \ the ligand polarization energy S po1 , 
and the distortion energy S dlst on the cutoff distance i? cut . Here, the deviation and the 
gradient are defined as A F(R cat ) = F(R cut ) — F( oo) and G = dF(R cnt )/dR cut , respectively, 
where F is either E or S. In these violin plots, the width of the shaded area is proportional 
to the frequency of observations. Large blue points are placed at mean values. In the plot of 
AH po1 as a function of R cut , the green line is a function that was fitted to the mean values, 

80.778i? c “t + 0.177. 
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electric field; the magnitude of the induced dipole moment of the ligand; and the classical 
polarization energy (Fig. [h]). The linear correlation is strong with the magnitude of the 
electric held on the ligand center of mass, [E°|, and even stronger with the magnitude of 
the total electric held vector active on all ligand atoms, (Fig. [6] and S5 in the 

Supporting Information). Intriguingly, in both cases, there appear to be two distinct trends 
relating the electric held to the magnitude of the electric held; a linear correlation exists 
in systems where S po1 < -50 kcal/mol, but the slope is distinct from in systems where -50 
kcal/mol < S po1 < 0 kcal/mol. The two measures of the electric held are also correlated with 
each other, with a Pearson’s R of 0.54 (Fig. S6 in the Supporting Information). Similarly, 
the ligand polarization energy S po1 is also strongly correlated with the magnitude of the 
induced dipole moment of the ligand. There is a stronger correlation with the magnitude 
of the induced dipole moment based on wave functions |/z™ d,< ^ M |, where /// l<Lf ' XI is from Eq. 


24 than the magnitude of the induced dipole moment based on the molecular polarizability 


tensor, |/x“ d ’ QL |, where /F/' ri,a '' is from Eq. 25 (Fig. [6] and S7 in the Supporting Information). 

Finally, in addition to the strong relationship between the ligand polarization energy 
S po1 and both the magnitude of the electric held and the induced dipole, there is also a clear 
correspondence between the ligand polarization energy S po1 and the classical polarization 
energy. Of approaches to compute the classical polarization energy, treating the entire ligand 


as a dipole and using Eq. 24 for the induced dipole moment led to the best correlation with 
the quantum polarization energy (Fig. [6] and S8 in the Supporting Information). The 
clear correlation between the two quantities suggests that the classical model of a dipole 
in an electric held is a reasonable explanation for the quantum behavior. Limitations of 
the molecular polarizability model are described in Fig. S9 and S10 in the Supporting 
Information. 

The observed linear correlation between the ligand polarization energy and the magnitude 
of the electric held |E°| (Fig. [6]) has potential implications for modeling protein-ligand 
interactions with MM, including molecular docking. Because |E ( /| is computed without a QM 
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Figure 6: The ligand polarization energy, S po1 , as a function of the magnitude of the electric 
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(top), the magnitude of the induced dipole moment 
classical polarization energy S pol,cL (bottom), where E p , and S pol,cL are from Eq. 

23, Eq. 24, and Eq. 22, respectively. The range of S po1 is either S po1 < 0 kcal/mol (left) or 


-50 kcal/mol < E po1 < 0 kcal/mol (right). 
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calculation, a relatively inexpensive polarization energy estimate based on linear regression 
can be added to binding energy estimates. Such an approach could recapitulate some of the 
success of semi-empirical QM in reconstructing binding poses 

3.5 Polarization is a substantial and variable fraction of interaction 
and binding energies 

We observe that the ligand polarization energy S po1 can be a substantial and highly system- 
dependent fraction of the interaction energy and binding energy (Fig. [7]). In most systems 
where -50 kcal/mol < S po1 < 0 kcal/mol, the ratio £ pol /S elec ranges from 0 to 0.4 (Fig. [7^). 
Exceptions occur when 5 elec is positive, leading to a negative ratio, or when it is small, leading 
to a ratio much larger than 1 (Table S2 in the Supporting Information). Positive and small 
values of H elec result from positive E Coul . For example, the complex 5c2h has S po1 = —22.45 
kcal/mol, E Coul = 19.60 kcal/mol, and S elec = —2.85 kcal/mol. Hence, S pol /S elec = —7.88. 
The histogram of S pol /(i7 pair + S po1 ) is compressed compared to S pol /S elec , with the range 
with the largest density reduced to between 0 and 0.2 (Fig. IH). Smaller ratios are due 
to the addition of van der Waals interactions that increase values in the denominator. The 
histograms of 5 pol /(T bmd ’ np _|_ ~;p o1 ) and S pol /'F bmd is notable for a clear peak around 0.2 
(Fig. [7j & d). If all systems in the PDBBind are considered, qualitative trends are similar 
but there is increased density at higher ratios (Fig. Sll in the Supporting Information). 

When considering the polarization energies of three HIV-protease inhibitors, Hensen 
et al. 6 found that S po1 can approach one-third of the electrostatic interaction energy. In 
our much larger data set, we found that S po1 can be a larger fraction of S elec . 

With the caveat that polarization could be overestimated in these cases, two examples 
where S pol /(T bmd ’ np + S po1 ) is particularly large, 3dxl and 3dx2, highlight the potentially 
outsized importance of S po1 for small ligands (Table S2 in the Supporting Information). 
In 3dxl, \j/ bind > n P -p 2 po1 = —12.953 kcal/mol, E po1 = —80.77 kcal/mol, and the ratio is 
S p oi/(^bind,np + S poi) = 6 . 2 36. For comparison, in 2zcq, T bind ’ np + 5 po1 = -295.48 kcal/mol, 
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Figure 7: Histograms of ratio of the polarization energy of the ligand to (a) the electrostatic 
interaction (S elec = E Coul + S po1 ), (b) the intermolecular pairwise potential energy with the 
ligand polarization energy (_E pair + S po1 ), (c) the binding energy without considering ligand 
polarization in the solvation free energy (d/ bind > n P + S po1 ), and (d) the binding energy with 
considering ligand polarization in the solvation free energy ( v h bind ). The histograms are 
truncated at a ratio of 1.25. Data are only included for complexes where S po1 < 0 kcal/mol 
(left) or -50 kcal/mol < E po1 < 0 kcal/mol. For analogous histograms including all data, see 
Fig. Sll in the Supporting Information. 
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S po1 = —128.01 kcal/mol, and the ratio is H pol /(\I/ bind ’ np + S po1 ) = 0.43. The ligand in 
3dxl (Fig. [4]) is much smaller than the ligand in 2zcq (Fig. [8]). Small ligands have fewer 
opportunities for pairwise contacts with their protein binding partners than larger ligands. 
The limited number of contacts leads to a weaker 4/ bmd,np , such that S po1 can play a larger 
role. 



Figure 8: The molecular structure of the ligand with two Magnesium cations Mg 2+ in the 
complex, 2zcq. Hydrogen, carbon, oxygen, magnesium, phosphorus, and sulfur atoms are 
colored with white, gray, red, pink, orange, and yellow, respectively. 

The relative importance of ligand polarization in small ligands may explain the poor per¬ 
formance of binding free energy methods based on a fixed-charge force field in distinguishing 
molecules that are active and inactive against T4 lysozyme L99A. 38 I 11 this protein, the L99A 
mutation forms a pocket known to bind a number of small hydrophobic compounds. Xie 
et al. 38 performed binding free energy calculations for a library of 141 small hydrophobic 
compounds whose thermal activity against T4 lysozyme L99A had been measured. Many of 
the compounds contained highly polarizable aromatic groups. The best-performing method 
in Xie et al. 38 had an area under the receiver operating characteristic curve of 0.74 (0.04) 
out of 1 for a perfect binary classifier. Binary classification performance could potentially 
be improved by incorporating the ligand polarization, as described in the current paper. 
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3.6 Solvation and polarization can be key drivers of native complex 
formation 

For a number of native complexes, both polarization and solvation were required to com¬ 
pute negative binding energies (Fig. [9]). Due to the harmonic restraint maintained during 
minimization, our models closely resemble their native crystal structures. In order for these 
protein-ligand complexes to adopt these structures, they should have a negative binding 
energy (presuming that binding results in entropy loss). If calculated binding energies for 
experimentally observed structures are positive, it suggests a structural modeling issue (e.g. 
protonation state) or that critical phenomena are not properly described. Intriguingly, the 
Coulomb interaction energy is positive in a significant fraction of these systems (Fig. 00 - 
Incorporating van der Waals interactions in U pair slightly reduces the number of systems in 
which the interaction energy is positive (Fig. 00 - However, these pairwise terms, which 
are standard to molecular docking, are insufficient to accurately describe all the native com¬ 
plexes with a negative interaction energy. Incorporating a ligand polarization term (Fig. 0 > 
&[9]i) or solvation energy term (Fig. [9j? & g) alone is also insufficient. However, when both 
polarization and solvation are considered, all the native complexes have a negative binding 
energy (Fig. [9]f & h). Considering both polarization and solvation terms also appears to 
attenuate the broad range of binding energies observed in U pair , E pair + S po1 , and vj bind ’ np 
(Fig. [9]f & h). Using the partial charges q^ M opposed to q^ M : Qi does not have a quali¬ 
tative effect on these trends. The trends also hold for systems within the normal range of 
—50 < S po1 < 0 kcal/mol (Fig. S12 in the Supporting Information). The importance of 
including ligand polarization and solvation was previously noted by Kim and Cho 1 ^ who 
achieved superior performance at binding pose prediction using a protocol that combined 
atomic charges from QM/MM with solvation compared to using either by themselves. 

In cases where the pairwise interaction energy is positive, the change in solvation free 
energy upon binding can still be negative. Although the electrostatic component of the solva¬ 
tion energy change should have the same sign as the Coulomb interaction energy, formation 
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Figure 9: Histograms of intermolecular potential energies: (a) the permanent Coulomb inter¬ 
action (FZ 00111 ), (b) the electrostatic interaction (S elec = E Coul + S po1 ), (c) the intermolecular 
pairwise potential energy (FZ pair = E vdW + FZ Coul ), and (d) the intermolecular pairwise po¬ 
tential energy with the polarization energy of the ligand (FZ pair + S po1 ) in the gas phase. 
Histograms of binding energies: the binding energy (e) without considering ligand polariza¬ 
tion at all, 'h bmd ’ np , and (f) considering ligand polarization for electrostatic interactions but 
not in the solvation free energy, vj/ bmd ’ np -f S po1 , (g) considering ligand polarization in the 
solvation free energy but not for electrostatic interactions, vj/ bind _ S po1 , or (h) considering 
ligand polarization both in the electrostatic interactions and the solvation free energy. A 
similar plot that only considers systems for which —50 < E po1 < 0 kcal/mol is available as 
Fig. S12 in the Supporting Information. 
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of a complex reduces the solvent-accessible surface area. An illustrative example occurs in 
the complex lz9g. The ligand in this complex is soluble due to a 3-oxopropanoic acid moiety 
and other hydrophillic components, but it contains an aromatic ring that is buried upon 
binding to the protein. The decreased solvent-accessible surface area, especially around the 
aromatic moiety, reduces ordering of solvent in the vicinity of the solute and thereby leads 
to an increase in entropy. 

The lowest E Coul are due to phosphate groups. The lowest E Coul is observed in the 
complex 2zcq. The complex contains two Mg 2+ in close proximity to a negatively-charged 
phosphate group (Fig. [8]). The complex lulb also has a very low E Cou[ . The ligand in lulb 
contains four phosphates (Fig. S13 in the Supporting Information). RESP charges on the 
phosphorus are around 1.4 e and oxygen charges range from -0.4 to 0.8 e, leading to a low 


3.7 Solvation but not polarization improves correlation with exper¬ 
imental binding free energies 

An important goal in protein-ligand modeling is the accurate calculation of binding free 
energies - which quantify the strength of noncovalent association - that are consistent with 
experimentally observed values. 

For several reasons, the computed binding energy AG bmd is not expected to completely 
agree with the experimentally measured binding free energy AG bmd for complexes in the 
PDBBind Core Set. These reasons include that: 

• The binding free energy AG bmd is not rigorously equivalent to T bmd , but is actually 
an exponential average over the ensemble of the complex. 3940 Using 4/ bmd to model 
AG^md is an approximation that neglects entropy. 

• The binding energy model is not exact. For example, the present model does not 
explicitly treat polarization of the free ligand by solvent, polarization of the protein by 
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the ligand, and the solvation model does not include explicit water. 

• The PDBBind is a heterogeneous data set in which experimental AG bind were deter¬ 
mined by various modalities and under different experimental conditions. There may 
be systematic differences between measured AG bmd that are not considered in our 
models. 


• On a related note, experimental conditions used to obtain crystal structures and bind¬ 
ing affinity data are different. Crystal structures have packing forces and are generally 
at a lower temperature. 


Nonetheless, a comparison between computed interaction energies and experimental binding 
free energies can be informative. 

While the treatment of solvation is essential, ligand polarization energies have a minimal 


effect on the correlation between vj/ bmd and experimental AG bmd (Fig. 10 and Fig. S14 
in the Supporting Information). If solvation energies are not considered, the distribution of 
intermolecular pairwise potential energies E pair of the protein-ligand complexes is distributed 
extremely broadly from -1000 kcal/mol to 250 kcal/mol and the correlation between vj/ bind 


and experimental AG bind is negligible (Fig. lOr&b and Fig. S14a&b in the Supporting 
Information). Incorporating solvation but not polarization significantly improves Pearson’s 
R to 0.47 for complexes where -50 kcal/mol < S po1 < 0 kcal/mol and 0.40 for complexes where 


S po1 < 0 kcal/mol (Fig. 10: and Fig. S14c in the Supporting Information). Although the 
range of computed binding energies is dramatically reduced to -200 kcal/mol to 0 kcal/mol, 
it is still very broadly distributed compared to the distribution of experimentally measured 
binding free energies (-16 kcal/mol < AG bind < -3 kcal/mol), supporting the idea that a single 
structure cannot represent an ensemble of structures obtained in experimental conditions. 
Adding the polarization energy to solvation energies computed without solvation has no 
effect on the solvation energy (Fig. [Toj i and Fig. S14d in the Supporting Information). 
In comparison, computing solvation energies using partial charges from the induced dipole 
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lOs&f and Fig. S14e&f in the Supporting 


diminishes correlation with experiment (Fig. 

Information). 

In a critical assessment of a number of docking programs and scoring functions across 
eight different diverse proteins, Warren et al.concluded that “no statistically significant 
relationship existed between docking scores and ligand affinity.” Our data suggest that the 
lack of correlation stems from a poor or nonexistent treatment of solvation in the scoring 
functions. Perhaps due to cancellation of error, neglect of ligand polarization does not appear 
to be a major factor in the poor performance of docking scores. 

4 Conclusions 

Using a QM/MM approach., 1 6 1 23142 43 we computed polarization energies S po1 for 286 com¬ 
plexes in the PDBBind Core Set! 25 The distribution of 5 po1 , S dlst , and S stab were found to be 
broad and skewed. For properly prepared systems without atoms in unrealistically close con¬ 
tact, these terms all have the expected sign of S po1 < 0, S dlst > 0, and S stab < 0. The lowest 
S po1 were observed in systems where cations are close to ligand atoms. In these systems, the 
extent of polarization is likely to be overestimated. The importance of including embedding 
held charges on S po1 appears to diminish, on average, as an inverse square law. There is no 
clear relationship between S po1 and the percentage of highly charged atoms in a protein and 
molecular polarizability scalar. There is a weak correlation between S po1 and the Coulomb 
energy E Coul . On the other hand, there is a stronger linear correlation between S po1 and 
the magnitude of the electric held, the magnitude of the induced dipole moment, and the 
classical polarization energy. The ligand polarization energy S po1 is observed to a substantial 
and system-dependent fraction of the electronic interaction energy and the total interaction 
energy. In some systems, consideration of ligand polarization and solvation are both essential 
for calculating negative interaction energies for crystallographic complexes. While consider¬ 
ation of solvation is essential for achieving moderate correlation between interaction energies 
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Figure 10: Comparison of interaction energies to experimentally measured binding free en¬ 
ergies for complexes with -50 kcal/mol < S po1 < 0 kcal/mol. Interaction energies are the (a) 
the intermolecular pairwise potential energy (E pair = A vdW + A Coul ); (b) the intermolecular 
pairwise potential energy with the polarization energy of the ligand (i? pair + S po1 ) in the gas 
phase; the binding energy (c) without considering ligand polarization at all, vp bmd > n P ; and 
(d) considering ligand polarization for electrostatic interactions but not in the solvation free 
energy, vJ/ bind > n P _|_ S po1 , (e) considering ligand polarization in the solvation free energy but 
not for electrostatic interactions, vp bmd _ S po1 , or (f) considering ligand polarization both in 
the electrostatic interactions and the solvation free energy. A similar plot for all complexes 
is available as Fig. S14 in the Supporting Information. 
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and experiment, we did not observe that the ligand polarization energy S po1 improves the 
correlation between the binding energy and experimental binding free energies. 
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• Fig. S9. The structure of the complex 3tsk of human matrix metalloprotease-12 
(MMP12) in complex with L-glutamate motif inhibitor. 

• Fig. S10. Schematic of protein-ligand complexes in which the ligand center of mass is 
inside the ligand or the protein. 

• Fig. Sll. Histograms of ratio of the polarization energy of the ligand to intermolecular 
potential energies and binding energies in all complexes where S po1 < 0 kcal/mol. 

• Fig. S12. Histograms of intermolecular potential energies and binding energies in 
systems where -50 kcal/mol < S po1 < 0 kcal/mol. 

• Fig. S13. The structure of the complex lulb of bovine pancreatic Ribonuclease A with 
a ligand. 
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Supporting Information: Histograms 
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Figure SI: Histograms of the ligand polarization (top, S po1 ), distortion (middle, S dlst ), and 
stabilization (bottom, S stab ) energies in the PDBBind Core Set for systems with (left) and 
without (right) cations. The three quantities are related by S po1 = 5 dlst + S stab . 
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Figure S2: Normalized histogram of partial atomic charges for protein atoms in the data set. 
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Supporting Information: Correlations 


The percentage of atoms in a protein that are highly charged does not appear to be a 
significant factor in the ligand polarization energy (Fig. [S3] ) . In all the systems, only a small 
percentage of atoms (less than 8%) have atomic charges of \q\ > 0.6. 

The number density of charged atoms is more related with the ligand polarization energy. 
However, we only observed weak correlation, with Pearson’s R being 0.32 for all complexes 


and 0.33 for complexes for which 5 po1 > —50 kcal/mol (Fig. S3). 

It would be reasonable to think that the polarization energy is related to the Coulomb 
interaction energy. However, the correlation is also weak, with Pearson’s R being 0.29 for 


all complexes and 0.25 for complexes for which S po1 > —50 kcal/mol (Fig. S3). 
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Figure S3: The polarization energy 5 po1 as a function of the percentage of charged atoms 
(top), the number density of highly charged atoms (middle), and Coulomb energy E Coul . 
Data are included for complexes with S po1 < 0 kcal/mol (left) or only for complexes with -50 
kcal / mol < 5 po1 < 0 kcal/mol (right). The number density of charged atoms in a binding 
site is defined as the number of charged atoms with |g| > 0.6 divided by the volume of the 
site. The volume of the site is the region within 6 A of any ligand atom. 
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The molecular polarizability scalar of ligand molecules (oil) has a strong linear correlation 


with the number of electrons in the system (Fig. S4). This observation is reminiscent of 
one of the properties of halide anions, whose polarizabilities are observed in the following 
order: F < Cl - < Br _ < I - . However, there is no clear relationship between the molecular 
polarizability scalar and the ligand polarization energy. 



Figure S4: Polarizability of the ligand (a^) versus the the number of ligand electrons (N e i ec ) 
in ligands from the protein-ligand complexes. 
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In contrast with the aforementioned properties, there is a much clearer relationship be¬ 


tween the ligand polarization energy, S po1 , and the magnitude of the electric held (Fig. S5). 
The linear correlation is strong with the magnitude of the electric held on the ligand center 
of mass, |E°|, and even stronger with the magnitude of the total electric held vector active 
on all ligand atoms, | Xmei. | • Intriguingly, in both cases, there appear to be two distinct 
trends relating the electric held to the magnitude of the electric held; a linear correlation 
exists in systems where S po1 < -50 kcal/mol, but the slope is distinct from in systems where 
-50 kcal/mol < S po1 < 0 kcal/mol. The two measures of the electric held are also corre¬ 


lated with each other, with a Pearson’s R of 0.54 (Fig. S6). In general, the magnitude of 
is greater than the magnitude of S po1 , suggesting that electric held vectors on 
individual atoms generally point in a similar direction. 
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Figure S5: The ligand polarization energy, S po1 , as a function of the magnitude of the electric 
field. The electric field vector is either on the ligand center of mass, |E°|, where E 1 ^ is from 
Eq. 22 (top) or the sum of vectors on the ligand atom sites, | Xmer ®A |j where E° 4 is from Eq. 
29 (bottom). The range of S po1 is either S po1 < 0 kcal/mol (left) or -50 kcal/mol < S po1 < 0 
kcal/mol (right). 
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Figure S6: Comparison of electric field estimates. The magnitude of the electric 
field on the ligand, |E°|, versus of the vector sum of the electric field on all ligand atoms, 
IE^xE“| (top). The magnitude of the induced dipole based on the molecular polarizability 
tensor, | / u“ d ’“ L | versus of the induced dipole based on the dipole operator, |/i™ d ’^ M (bottom). 
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Similarly, the ligand polarization energy S po1 is also correlated with the magnitude of the 
induced dipole moment on the ligand. There is a stronger correlation between the ligand 
polarization energy S po1 and the induced dipole based on the wave functions ^ mi -Q M (Eq. 


23) than the induced dipole based on the molecular polarizability tensor (Fig. ST). 

The latter quantity, which is ultimately based on three pairs of point charges, does 

not perfectly recapitulate polarizability of the more complex embedding field; Pearson’s R 
is 0.62 (Fig. [S6}) 
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Figure S7: The ligand polarization energy, S po1 , as a function of the magnitude of the induced 
dipole moment. The induced dipole moment is either based on wave functions, |/z“ d,( ^ M |, 
where /x™ d,< ^ M is from Eq. 23 (top) or the molecular polarizability tensor, |^i“ d ’ aL |, where 
H™ d ’ aL is from Eq. 24 (bottom). The range of H po1 is either 5 po1 < 0 kcal/mol (left) or -50 
kcal/mol < H po1 < 0 kcal/mol (right). 
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In addition to the strong relationship between the ligand polarization energy S po1 and 
both the magnitude of the electric held and the induced dipole, there is also a clear corre¬ 
spondence between the ligand polarization energy S po1 and the classical polarization energy 


S p °i’CL (Fig. S8). The clear correlation between the two quantities suggests that the classical 
model of the ligand as a single dipole in an electric held is a reasonable explanation for the 
quantum behavior. In contrast, the correlation between S po1 and S pol,cA is much weaker, 
which indicates that the classical model of the ligand as a set of atom-centered dipoles is a 
poor description of the quantum phenomenon. The correlation is stronger between S po1 and 
S p °i’CL than between S po1 and 2 pol ’ cL ’" L because the molecular polarizability model does not 


perfectly capture the induced dipole moment (Fig. S6). 
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Figure S8: The ligand polarization energy, S po1 , as a function of the classical polarization 
energy. The classical polarization energy is either S poI,cL (Eq. 22), using Eq. 23 for the 
induced dipole moment (top), £p o1 > cL >«l (Eq. 22), using Eq. 24 for the induced dipole 
moment (middle), or S pol,cA (Eq. 27). The range of 5 po1 is either S po1 < 0 kcal/mol (left) or 
-50 kcal/mol < E po1 < 0 kcal/mol (right). 
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Supporting Information: Limitations of molecular polar¬ 
izability model 


In many cases, the failure of the molecular polarizability to recapitulate the induced dipole 
is due to the location of the ligand center of mass. For most complexes, the magnitude of the 
induced dipole moment based on the molecular polarizability tensor, |E™^“ i | is comparable 
to the magnitude of the induced dipole from the quantum mechanical operator, |//™ d,< ^ M |. 


However, in nearly 8% of complexes, |E™^“ i | is much larger than |/^ d ’ yM |. In many of these 


ind,QM i 


cases, such as 3tsk (Fig. S9), the ligand center of mass is within the protein (Fig. S10). 
Because the ligand center of mass is within the protein, it is very close to embedding held 
charges and the magnitude of the electric held is particularly strong. 
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Figure S9: The structure of the complex 3tsk of human matrix metalloprotease-12 (MMP12) 
in complex with L-glutamate motif inhibitor. The center of mass of the ligand is placed inside 
the protein. 



(a) (b) 

Figure S10: Schematic of protein-ligand complexes in which the ligand center of mass is 
inside the ligand or the protein. The ligand is colored red and protein blue. In (a), the 
center of mass of the ligand is placed inside the ligand, whereas in (b) the center of mass of 
the ligand is inside the protein. 
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Supporting Information: Other Figures 
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Figure SI 1: Histograms of ratio of the polarization energy of the ligand to (a) the electrostatic 
interaction (S elec = E Coul + S po1 ), (b) the intermolecular pairwise potential energy with 
the ligand polarization energy (E pair + S po1 ), (c) the binding energy without considering 
ligand polarization in the solvation free energy (\p bm<lnp _j_ £ pol ) ? and (d) the binding energy 
with considering ligand polarization in the solvation free energy (T bind ). Data are from all 
complexes where S po1 < 0 kcal/mol. The histograms are truncated at a ratio of 1.25. 
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Figure S12: Histograms of intermolecular potential energies in systems where -50 kcal/mol 
< S po1 < 0 kcal/mol: (a) the permanent Coulomb interaction (E Coul ), (b) the electrostatic 
interaction (S elec = E Coul + S po1 ), (c) the intermolecular pairwise potential energy (_F pair = 
E vdW -g E Coul ), and (d) the intermolecular pairwise potential energy with the polarization 
energy of the ligand (_E pair + S po1 ) in the gas phase. Histograms of binding energies: the 
binding energy (e) without considering ligand polarization at all, vj/ bmd > np ; and (f) considering 
ligand polarization for electrostatic interactions but not in the solvation free energy, vJ/ bmd > n P-(- 
S po1 , (g) considering ligand polarization in the solvation free energy but not for electrostatic 
interactions, \H bmd — S po1 , or (h) considering ligand polarization both in the electrostatic 
interactions and the solvation free energy. A similar plot that includes systems containing 
cations is available in the main text. 
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Figure S13: (a) The structure of the complex lulb of bovine pancreatic Ribonuclease A 

with the ligand (S'-phosphothymidine (S'-h^-pyrophosphate adenosine S'-phosphate) and (b) 
the molecular structure of the ligand. 
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Figure S14: Comparison of interaction energies to experimentally measured binding free 
energies for all complexes where S po1 < 0 kcal/mol. Interaction energies are the (a) the 
intermolecular pairwise potential energy (i? pair = _E vdW -g E Coul )-, (b) the intermolecular 
pairwise potential energy with the polarization energy of the ligand (_E pair + S po1 ) in the gas 
phase; the binding energy (c) without considering ligand polarization at all, vj/ bmd > np ; and 
(d) considering ligand polarization for electrostatic interactions but not in the solvation free 
energy, vj/ bind - np + S po1 , (e) considering ligand polarization in the solvation free energy but 
not for electrostatic interactions, vk bind — S po1 , or (f) considering ligand polarization both in 
the electrostatic interactions and the solvation free energy. 
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Supporting Information: Estimated Overpolarization 

When cations are close to ligands, the extent of ligand polarizationAais likely overestimated 
by the QM/MM scheme used in this paper. To assess the extent of overpolarization, we 
performed some calculations in which cations were included in the QM region. In this 
modified scheme, there is no direct way to isolate the ligand polarization energy from energy 
of the complex. Instead, we estimated the induced dipole of the ligand based on RESP 
atomic charges, 


ind,RESP _ \ ''/ QM:Ql QMn 


A&L 


K = ? \1T~" -1a ) r a- 


(31) 


The ligand polarization energy is then computed by, 


^pol,RESP _ 


ind,RESP t-iO 
~l*L ' 


(32) 


where E” is the electric field acting on the center of mass of the ligand. 

In the selected systems where cations are very close to ligand atoms, the ligand polariza¬ 
tion energy estimated with the main QM/MM scheme in this paper is likely too low (Table 
SI). When the only ligand is in the QM region, the estimated ligand polarization energy is 
fairly consistent; S pol (L) ~ S pol,cL (L) ~ S pol,RESP (L). When the QM region is expanded to 
include cations, the ligand polarization energy based on RESP atomic charges is significantly 
higher. For 3dxl and 3dx2, it is about 20 kcal/mol higher. For 2zcq, where S po1 is especially 
low, S pol,RESP (LC) has the opposite sign! 
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Table SI: Dependence of the ligand polarization energy on the QM region. 
S pol (X), S pol,cL (A"), and i? pol ’ RESP (X) are from Eqs. 6, 22, and S2, respectively, 
with the QM region of X. Here, either the ligand (L) or the ligand and cations 
( LC ) are included in the QM region. The unit of the polarization energy is in 
kcal/mol. 


PDB ID 

S pol (L) 

S pol,cL( L ) 

'Z'pol.RESP 

S pol ,kesf( LC -) 

3dxl 

-80.77 

-92.20 

-87.55 

-69.11 

3dx2 

-73.34 

-51.71 

-49.51 

-30.77 

2zcq 

-128.01 

-132.86 

-128.67 

109.02 




Table S2: Complexes with ratios outside of the range of Fig. 7. 


PDB ID 

"pol 

"elec 

"pol ^/"elec 

lmq6 

-10.590 

48.843 

-0.217 

lo3f 

-18.158 

56.933 

-0.319 

loyt 

-18.273 

-3.716 

4.917 

3gc5 

-28.756 

-12.884 

2.232 

3ge7 

-33.854 

2.873 

-11.785 

3gy4 

-19.494 

-12.887 

1.513 

3ui7 

-16.902 

93.232 

-0.181 

3uuo 

-8.104 

108.339 

-0.075 

4abg 

-12.907 

-4.975 

2.594 

4ea2 

-32.932 

21.277 

-1.548 

411x 

-12.174 

-6.190 

1.967 

4mme 

-10.732 

21.803 

-0.492 

4msc 

-18.087 

6.736 

-2.685 

4msn 

-7.198 

8.481 

-0.849 

5clw 

-11.798 

5.292 

-2.229 

5c28 

-10.404 

15.595 

-0.667 

5c2h 

-22.453 

-2.850 

7.878 

PDB ID 

"pol 

Pair _j_ ^pol 

S po1 /(F pair + S po1 ) 

lo3f 

-18.158 

27.304 

-0.665 

3ui7 

-16.902 

51.054 

-0.331 

3uuo 

-8.104 

65.857 

-0.123 

PDB ID 

"pol 

^bind,np _|_ "pol 

"poly'^^bind,np _j_ "pol^ 

2weg 

-68.146 

-53.759 

1.268 

3dxl 

-80.769 

-12.953 

6.235 

3dx2 

-73.340 

-29.526 

2.484 

3kwa 

-77.169 

-45.308 

1.703 



